Pathogenic/likely pathogenic mutations identified in Vietnamese children diagnosed with autism spectrum disorder using high-resolution SNP genotyping platform

Among the most prevalent neurodevelopmental disorders, Autism Spectrum Disorder (ASD) is highly diverse showing a broad phenotypic spectrum. ASD also couples with a broad range of mutations, both de novo and inherited. In this study, we used a proprietary SNP genotyping chip to analyze the genomic DNA of 250 Vietnamese children diagnosed with ASD. Our Single Nucleotide Polymorphism (SNP) genotyping chip directly targets more than 800 thousand SNPs in the genome. Our primary focus was to identify pathogenic/likely pathogenic mutations that are potentially linked to more severe symptoms of autism. We identified and validated 23 pathogenic/likely pathogenic mutations in this initial study. The data shows that these mutations were detected in several cases spanning multiple biological pathways. Among the confirmed SNPs, mutations were identified in genes previously known to be strongly associated with ASD such as SLCO1B1, ACADSB, TCF4, HCP5, MOCOS, SRD5A2, MCCC2, DCC, and PRKN while several other mutations are known to associate with autistic traits or other neurodevelopmental disorders. Some mutations were found in multiple patients and some patients carried multiple pathogenic/likely pathogenic mutations. These findings contribute to the identification of potential targets for therapeutic solutions in what is considered a genetically heterogeneous neurodevelopmental disorder.


Ethics declaration
Prior to the start of this study and enrollment of research participants, the Ethics committee of Hue Central Hospital in Vietnam approved this study.All the methods were performed in accordance with the relevant local guidelines and regulations.
Prior to collection of the saliva samples, written informed consents were signed by parents of 254 children diagnosed with Autism Spectrum Disorder allowing the authors of this study to use the saliva samples and all the related medical data for research, publications, and biobanking purposes.

DNA extraction
The genomic DNA was extracted from the collected saliva using Chemagic Prime™ Robot.The process is entirely automated using chemagen patented M-PVA Magnetic Bead technology for DNA and RNA purification with liquid handling to provide high throughput automated isolation of ultra-pure nucleic acids.The process is monitored in accordance with the Quality Control of ISO/IEC 17025.

Genotyping with The GFWv3 custom high-resolution arrays
The Axiom workflow was used on GeneTitan instruments (Manufacturer: Thermo Fisher Scientific, catalog number: 00-0373, Model: GeneTitan MC) with wrappers for Analysis Power Tools (APT)-Genotype-quality control tools (apt-geno-qc) and genotype calling tools (apt-probeset-genotype).A tool for SNP metric calculations and a tool to convert the output into Plink format for downstream genomics analyses were used.Samples were registered in a custom file format for a batch of 96 ".CEL" files from a single Axiom plate and a few auxiliary file formats specific to APT tools to facilitate the file selection process in Galaxy.The Galaxy workflow starts with receiving its input of ".CEL" and ".ARR" files from the instrumentation computer.It proceeds with extracting the ".CEL" files and executing the quality control tool with a user-specified Dish-Quality Control (QC) threshold (by default 0.82).The names of the samples that have passed the QC are passed on to the genotyping tool, along with the ".CEL" dataset, for the first round of genotyping.The output from this first round contains, among other metrics, the call rates for each sample.The samples with a call rate above a user-specified threshold (by default 97%), along with the ".CEL" dataset again, is input for the second iteration of genotyping.The final genotype calling report is then annotated with the phenotype data and converted into Plink format, and simultaneously processed by the SNP metrics tool to calculate such statistics as Call Rate (CR), Fisher's linear discriminant (FLD), FLD calculated for the homozygous genotype clusters (HomFLD), Minor Allele count.
The GFWv3 custom array is a High-resolution Affymetry SNP array consisting of 2.5 million (2.5 M) probes to assay for SNPs and CNVs with 800.000 direct targets and two million more with imputation.We designed and validated this array by both inter-assays (reproducibility > = 99.8%) and intra-assays (reproducibility > = 99.8%).Manufacturer: Thermo Fisher Scientific.There are 3 parts of each kit: Axiom™ GFWv3 96 well Plate: part number 551159; Axiom™ GeneTitan Consumables Kit: part number 901606 and the Axiom™ 2.0 Reagent Kit: part number 901758.

Variant validation
Identified variants were validated by Sanger sequencing and Amplification-refractory mutation system polymerase chain reaction (ARMS-PCR).Each unique mutation found in this study was validated using Sanger sequencing.If there are several samples found to carry the same mutation, we did Sanger sequencing to verify the mutation identified in one sample and in parallel, ARMS PCRs for the other samples carrying the same mutation.This method became a more cost-effective and accurate workflow as we expanded the analyses to more common variants.The principle of validation of identified variants with ARMS-PCR is shown in Supplemental Fig. 1.The list of primers designed for PCR and Sanger sequencing is included in Supplemental Table 1.
These oligos were designed for identified variants by using the Primer-BLAST tool of NCBI (https:// www.ncbi.nlm.nih.gov/ tools/ primer-blast/) and PRIMER1 on the website http:// prime r1.soton.ac.uk/ prime r1.html.The same DNA samples used for genotyping in this study were also used for the validation process.PCR amplification, Sanger sequencing, and ARMS PCR for variant verification.

Demographic and clinical characteristics of the research participants in this study
All 254 children recruited as research participants were from Central Vietnam and are current patients of Hue Central Hospital.We have not collected and analyzed DNA from the parents as the initial results showed all mutations found were heterozygous.The children were diagnosed with ASD using the DSM-IV.The male/female ratio in this study is 5.86, (Table 1).This is higher than previously estimated ratio in the general population which is closer to 4 8,11,[15][16][17] .This could be due to several factors such as those willing to participate in the study skewing the ratio to more males than expected.Furthermore, the fact that females often being diagnosed with ASD less or later than males or having different, less obvious symptoms compared to males 18 might contribute to this result.
Among the 254 samples collected, four samples were degraded yielding minimal amount of DNA or lowquality DNA.The families decided not to recollect these four samples.For the 250 samples that passed QC, the genotyping results were analyzed using the Axiom analyses Suite 5.1 (Thermo Fisher).
All the children in this study have speech delays characterized by no word being spoken by the age of 16 months and other communication issues such as not responding to their name when being called, avoiding eye contact or avoiding interacting with others, as reported by their parents (Table 1 and Sup.Table 3).
Mutations identified were categorized as pathogenic/likely pathogenic according to Clinvar database (https:// www.ncbi.nlm.nih.gov/ clinv ar/).A total of 23 pathogenic/ likely pathogenic mutations were identified in this study including 12 missense mutations, 5 stop-gained mutations, 3 frameshift mutations, 1 splicing mutation, and two in non-coding transcripts (Table 2).A few variants showed conflicting interpretations (pathogenic to benign).For instance, rs1799990 in the PRNP gene was classified as pathogenic/risk factor/ likely benign.However, based on the high frequency of occurrence of this SNP (also shown in this study), it was classified as a likely benign variant.
Multiple pathogenic mutations were identified in single individuals.Several cases with very severe ASD diagnosis were found to be the carriers for multiple mutations.For instance, AUT4875, is the carrier of 4 different mutations, two in the ZGRF1 gene, one in the SCN9A gene and another in the HCP5 gene; AUT4870 carries mutations in the RIPK1 and SCN9A gene (Table 3).
In the two predominant databases for genes associated with ASD, SFARI and AutDB, only 9 mutations were previously reported among the total of 23 unique pathogenic/likely pathogenic mutations identified.Though, many of the identified genes identified are implicated in ASD and other neurological disorders as discussed  33 and PRKN 34 .The ratio of males/ females among the samples with identified mutations is 3.93 (Sup.Table 3).

Discussion
To date, hundreds of potential genetic alterations have been identified as associated with autism though with no defining cohort.What has been determined is that there is likely shared pathophysiology for neurodevelopmental disorders and that autism is along a continuum between intellectual disability and schizophrenia 35 .Since ASD is multigenic and heterogeneous and can occur in conjunction with other neurological conditions, it is difficult to discern the genes that are responsible for the disease phenotypes 36 .Previous studies showed consistent results of two classes of proteins, those involved in synapse formation and those involved with transcriptional regulation and chromatin-remodeling pathways 37 .
From 250 Vietnamese children diagnosed with varying degrees of autism spectrum disorder, our highresolution SNP array data has identified both rare and common SNPs previously known to associate with ASD, we then validated these data and provided information regarding frequency among Southeast Asians.
Of the confirmed SNPs, there were 7 SNPs that were shared among several of the children tested, these were SNPs in HCP5, SRD5A2, PRNP, ZGRF1, SCN9A and LOC107987057.rs2395029 in HCP5 has not previously been identified as an autism-associated variant though there is increasing evidence that immune-related genes, such as HCP5, and immune dysregulation are associated with neurodevelopmental disorders 29 .Recent studies have shown that HCP5 rs2395029 is in complete linkage disequilibrium with HLA-B*5701 19 , which is a risk allele of intellectual disability 20 .Several alleles of HLA genes have been reported to be associated with autism, intellectual disability, schizophrenia 20 .Using Fisher Exact test, frequency of this mutation among the cases in this study is significantly higher than its frequency in the East Asian population (p-value = 0.000046) (Sup.Table 4).In terms of SRD5A2, a recent study found that ASD boys with rs9282858 mutation in this testosterone metabolism-related gene showed higher levels of restricted and repetitive behaviors 31 .In addition, multiple recent studies published on Clinvar have consistently concluded SRD5A2 rs9332964 as pathogenic/likely pathogenic.Frequency of this mutation among the cases in this study is also significantly higher than its frequency in the general East Asian population (p-value = 0.00013) (Sup.Table 4).Recently studies showed that key proteins in the metabolism of Table 2. Pathogenic/likely pathogenic variants identified in this study..Many of the mutations in GJB2 gene have been reported to be the cause mutation for hearing impairment 22 .

AUT4661 AUT4643 FAM98C (rs201037487)
FAM98C is an ASD candidate gene 24 AUT4875 AUT4861 AUT1045 AUT4904 AUT4916 AUT4647 AUT1020 ZGRF1 (rs76187047) ZGRF1 is related to Hot water epilepsy (HWE) SCN9A is highly expressed in the hypothalamus, a brain region that has been implicated in autism.Addition to the role in GABAergic neurotransmission, the role of SCN9A in autism might be mediated through changes in hypothalamic functions, which in turn can affect multiple hormonally regulated processes that are frequently disrupted in autism such as oxytocin-mediated social interactions.

Continued
Vol:.( 1234567890 38 .ZGRF1 encodes a protein with functions related to motor praxis and highly expressed in the cerebellum, raising the possibility that disrupted ZGRF1 may interfere with cerebellar function 39,40 .Two ZGRF1 variants detected as compound heterozygotes in 7 ASD patients in this study, rs61745597 and rs76187047, have been identified as the potential genetic causality of childhood apraxia of speech (CAS), which is prevalent in approximately 25-30% of children with ASD 41 .Both variants result in missense mutations, thereby it is logical to expect disruption of the ZGRF1 protein in both instances.While CAS is a complex disorder, it is unlikely that ZGRF1 would be the sole causative gene target, it is one more piece in the puzzle when narrowing down potential therapeutic targets for ASD and its associated disorders 40 .Mutations in the primary central nervous system sodium channels are associated with neurological, psychiatric, and neurodevelopmental disorders including autism.SCN9A has been indicated to be important for normal brain function and variants in this gene are involved in familial autism 42 .LOC107987057 or C9orf72 variants have been linked to several neurological disorders, including ASD 43 .However, Chi-square analyses showed that the frequencies of these 4 variants LOC107987057 rs2814707, ZGRF1 rs61745597, ZGRF1 rs76187047 and SCN9A rs12478318 observed in our study are not significantly different from the East Asian population in the 1000 Genome Project 30x (p-value = 0.27, 0.82, 0.82, and 0.21 respectively) (Sup.Table 4).In addition, given the high frequency of the minor allele, the association of these variants to ASD should be reconsidered.
Confirmed variants in the following GJB2 were seen in five or six children, respectively.GJB2 is most known for its linkage in children with non-syndromic genetic sensorineural hearing loss and has not been identified previously as having an association with ASD 22 .
Confirmed variants in RIPK1 44 , CAPN3 45 , KAT6A 46 , TACR3 47 , GJB2 48 , FAM98C 24 , PRKN, SLC3A1, CUBN, and PYGM were seen in one or two children in this study.Interestingly, rs751037529 in PRKN has been identified as a pathogenic variant associated with early-onset Parkinson's disease 49 .Also, PRKN knock-out mice show autistic-like behaviors, giving weight to PRKN as a potential candidate gene for ASD 34 .Previous studies have also shown that the disruption of genes that encode large amino acid transporters, like SLC3A1, increases the risk of ASD 50 .These abnormalities in large amino acid transporters can affect the utilization of certain amino acids and their availability during brain development, resulting in an increased risk of ASD.Along that line, rs143944436 in the CUBN gene identified in this study results in a premature stop codon resulting in a nonfunctional protein.The CUBN gene provides instructions for making a protein called cubilin which is involved in the uptake of vitamin B12 from food into the body, linking vitamins and their bioavailability as potential treatments for individuals with ASD.
Considering the vast number of inherited, common, and rare genetic variants that have been associated with ASD, the etiology is complicated, to say the least.This study specifically assessed a cohort of Southeast Asian children with varying degrees of ASD compared to a control population to identify those variants which may be potential diagnostic or therapeutic targets.
This study provides an initial step towards understanding the genetic underpinnings of ASD in Southeast Asian populations.We view these data as a contribution towards identifying the loci which contribute to ASD and we anticipate that some of these loci will eventually have sufficient evidence to become established robust ASD risk loci.Some genetic variants correlate to a very high risk of disease while most do not.In its simplest terms, a polygenic risk score (PRS), sometimes called a polygenic score (PGS) or a genetic/genomic risk score (GRS), reflects the overall genetic predisposition to a disease based on the sum of all known and common variants linked to that disease 51 .This study has focused on pathogenic/likely pathogenic mutations.The next step, the study will be furthered with analysis of more samples, not only for SNPs but copy number variants and also the attribution of more common variants using the Polygenic Risk Score.

AUT number
Gene Related diseases PRKN is implicated in brain development.The loss of the gene in mice results in autistic-like behaviors, accompanied with altered neuronal activity, abnormalities in synapse formation and synaptic molecular composition.

AUT1039 SLC3A1 (rs200483989)
Thirteen of these genes belong to the SLC family, of which SLC1A3, SLC32A1 and SLC38A7 are strong candidates for schizophrenia (SCZ).Most disease-associated genes in the gene sets belong to the SLC family, implying the important role of SLC in ASD and SCZ.AUT1050 PYGM (rs114073621) Glycogen storage disease V. Protein levels of PYGM and RAC1, a kinase that regulates PYGM activity, are reduced in the astrocytes in schizophrenia 25 .
Table 3. Genes in which pathogenic/likely pathogenic mutations were identified in this study.AUT autism cases number.

Table 1 .
Demographic and clinical characteristics of the participants in this study.All 254 children diagnosed with ASD and ASD-like behavior are shown in this table by gender, age (shown in average years old, and Standard Deviation (SD) in parenthesis).Delayed speech means impairment of verbal communication, one of the diagnostic criteria for ASD outlined in DSM-IV.